This dataset contains traffic violation information from all electronic traffic violations issued in the County of Montgomery.
It contains violations from 2012 to 2016. more than 800,000 entry.
Lets see what we can find..
Data Summary:
## Date.Of.Violation Time.Of.Violation
## 3/17/2015 : 1281 23:20:00: 1218
## 5/20/2014 : 1222 23:30:00: 1208
## 11/24/2015: 1169 23:00:00: 1184
## 12/8/2015 : 1147 22:53:00: 1179
## 2/11/2015 : 1135 22:50:00: 1125
## 5/6/2014 : 1128 22:57:00: 1125
## (Other) :809616 (Other) :809659
## Violation.Description
## DRIVER FAILURE TO OBEY PROPERLY PLACED TRAFFIC CONTROL DEVICE INSTRUCTIONS : 64132
## FAILURE TO DISPLAY REGISTRATION CARD UPON DEMAND BY POLICE OFFICER : 39614
## DRIVING VEHICLE ON HIGHWAY WITH SUSPENDED REGISTRATION : 32341
## FAILURE OF INDIVIDUAL DRIVING ON HIGHWAY TO DISPLAY LICENSE TO UNIFORMED POLICE ON DEMAND: 21027
## DRIVER FAILURE TO STOP AT STOP SIGN LINE : 18965
## (Other) :640618
## NA's : 1
## Violation.Location Latitude
## IS 370 @ IS 270 : 1838 Min. :-77.64
## W/B IS 370 @ IS 270 : 1829 1st Qu.: 39.01
## 10901 WESTLAKE DRIVE : 1404 Median : 39.06
## WAYNE AVE @ DALE DR : 1278 Mean : 28.43
## CLOPPER RD E/B @ ORCHARD HILLS DR: 1272 3rd Qu.: 39.13
## RT 28 @ BLACKBERRY DR : 1217 Max. : 77.04
## (Other) :807860 NA's :72298
## Longitude Geolocation
## Min. :-94.61 : 72298
## 1st Qu.:-77.18 (-76.9907366666667, 39.045425) : 246
## Median :-77.08 (-76.91044, 39.109775) : 211
## Mean :-66.45 (-77.0271333333333, 38.9920483333333): 117
## 3rd Qu.:-77.02 (39.109775, -76.91044) : 116
## Max. : 77.19 (39.0991266666667, -77.0421983333333): 75
## NA's :72298 (Other) :743635
## Belts.Flag Personal.Injury Property.Damage Commercial.License
## No :787012 No :807397 No :802025 No :790389
## Yes: 29686 Yes: 9301 Yes: 14673 Yes: 26309
##
##
##
##
##
## Commercial.Vehicle Alcohol Work.Zone Violation.State
## No :810076 No :815021 No :816579 MD :718892
## Yes: 6622 Yes: 1677 Yes: 119 VA : 32483
## DC : 19889
## XX : 6621
## PA : 5838
## FL : 3545
## (Other): 29430
## Vehicle.Type Vehicle.Production.Year
## 02 - Automobile :704614 Min. : 0
## 05 - Light Duty Truck: 50439 1st Qu.:2001
## 28 - Other : 17523 Median :2005
## 03 - Station Wagon : 14740 Mean :2004
## 06 - Heavy Duty Truck: 8391 3rd Qu.:2009
## 01 - Motorcycle : 7947 Max. :9999
## (Other) : 13044 NA's :5143
## Vehicle.Manfacturer Vehicle.Model Vehicle.Color Caused.an.Accident
## TOYOTA : 88638 4S : 93512 BLACK :158699 No :798368
## HONDA : 83667 TK : 55091 SILVER :148362 Yes: 18330
## FORD : 78439 ACCORD : 28944 WHITE :120937
## TOYT : 46659 CIVIC : 26378 GRAY : 84933
## NISSAN : 41593 CAMRY : 25807 RED : 65285
## (Other):477692 (Other):586915 BLUE : 61060
## NA's : 10 NA's : 51 (Other):177422
## Gender Driver.City Driver.State
## F:271463 SILVER SPRING:200054 MD :739032
## M:544214 GAITHERSBURG : 83880 VA : 25038
## U: 1021 GERMANTOWN : 66838 DC : 24467
## ROCKVILLE : 65780 PA : 4431
## WASHINGTON : 23623 FL : 2868
## (Other) :376478 NY : 2510
## NA's : 45 (Other): 18352
Data contain data about many types from Cars to trucks.. Lets see explore those types
##
## 01 - Motorcycle 02 - Automobile
## 7947 704614
## 03 - Station Wagon 04 - Limousine
## 14740 575
## 05 - Light Duty Truck 06 - Heavy Duty Truck
## 50439 8391
## 07 - Truck/Road Tractor 08 - Recreational Vehicle
## 869 3157
## 09 - Farm Vehicle 10 - Transit Bus
## 66 280
## 11 - Cross Country Bus 12 - School Bus
## 45 129
## 13 - Ambulance 13 - Ambulance(Emerg)
## 1 5
## 14 - Ambulance 14 - Ambulance(Non-Emerg)
## 2 8
## 15 - Fire Vehicle 15 - Fire(Emerg)
## 3 4
## 16 - Fire(Non-Emerg) 17 - Police(Emerg)
## 3 3
## 18 - Police Vehicle 18 - Police(Non-Emerg)
## 4 7
## 19 - Moped 20 - Commercial Rig
## 980 408
## 21 - Tandem Trailer 22 - Mobile Home
## 55 18
## 23 - Travel/Home Trailer 24 - Camper
## 17 10
## 25 - Utility Trailer 26 - Boat Trailer
## 856 40
## 27 - Farm Equipment 28 - Other
## 90 17523
## 29 - Unknown
## 5409
We Can see form the table that Automobile is the most occuring one which is what we expect
lets see how it compares to other types visualy
Log scale…
We can see that females make almost have the number of violations that males make in total.
Perhabs the most intersting variable in the dataset. When violations happen ?
This data from the year 2012 to 2016 .. there may be some patterns. but its not clear and its too noisy to note anything. lets smooth it and try again
defualt smoother doesnt help much.. that is because there is too many data.. Lets group by week. and take average over that week and see
Much better.. if you look closely there maybe a pattern here…
but we will look into that shortly… Lets try grouping by month too.
now the pattern is clear … to make it even clearer lets group by year and plot
years over each other
We can see that Violations increase over years.. and there seem to be a certain time where violations peak.
Here we can see that at May we see the most violations of the year.. and followed by october ? could that be the increase of people who travel
there at the summer ? or simply the start of summer and people go out more ?
and at 2015 something was diiferent and the peak was no longer at may.
Lets group by week too ### same for week
Its not very clear lets try something else..
We can see from this violations clearly how much each week differ from each year
Lets make it clearer… by taking average over days for the same time.
We can see from this plot at 6:00 AM most days the number of violations
We can see that only 3.9% from all violation caused personal or property damage.
There are 816698 observation each indicates a single violation in the dataset with 24 variables:
I main features of interest are the date and time of violation. and the damage it caused. I would like to see how violations happen yearly, and if there is a certian period where a lot of violations happen.
I created 5 variables to help me to group by date:
Date_new : same date but in Date format:
Date_month: the date of the month of violation ex: 2013-12-01
Date_week: similar to Date_month but for weeks
month_only: the month number. ex: 12
week_only: the number of the week in one year where violation happend.
Time_new: POSXit, format of time
Tme_new2 : chr format of time
time_only: factor format of time
I created “Caused.Any.Damage” variable. to be able to tell if an violation causeded any personal injury or proprety danage or an Accident (Yes, No) ### Of the features you investigated, were there any unusual distributions? Did you perform any operations on the data to tidy, adjust, or change the form of the data? If so, why did you do this?
Lets start by plotting the locations of each violations
We can see there is two main locations points are centered around * plotting the two main parts one at a time. and avoiding outliers
We can see they kind of form a map.
Lets see where accidents occure compared to where violations occure
It kind make the map of the state.. with major roads highlighted.. zooming in to see a closer picture…
here also we can see it highlights streets; if this coupled with a later of the map of the state it would be much more intersting.. however I will not do this in this project.
## , , = 2012
##
##
## 1 2 3 4 5 6 7 8 9 10 11
## F 3445 3669 4009 4631 5637 4366 3743 3699 3642 4364 4387
## M 6729 6600 7535 8519 11751 8694 7944 8413 8414 8782 9196
## U 79 73 80 20 9 12 7 4 11 13 8
##
## 12
## F 4135
## M 8741
## U 1
##
## , , = 2013
##
##
## 1 2 3 4 5 6 7 8 9 10 11
## F 4496 4286 5501 4960 5849 4593 5001 5498 5999 5903 5463
## M 8575 8616 10350 9876 12564 8949 10731 11344 12127 11529 11553
## U 27 7 10 6 4 12 16 50 44 16 11
##
## 12
## F 5324
## M 11225
## U 4
##
## , , = 2014
##
##
## 1 2 3 4 5 6 7 8 9 10 11
## F 4945 5246 6566 7429 7712 5982 6751 6031 6089 7087 6370
## M 10455 10628 12989 13967 14873 11588 12816 11780 12310 13019 12398
## U 61 14 17 9 16 18 7 10 16 44 28
##
## 12
## F 5578
## M 10859
## U 9
##
## , , = 2015
##
##
## 1 2 3 4 5 6 7 8 9 10 11
## F 6346 5539 7075 7021 6827 6507 6008 6739 6678 6802 6517
## M 11891 10820 13668 14383 14068 12624 13411 14503 13278 13107 13368
## U 10 36 20 14 17 24 10 71 15 9 5
##
## 12
## F 5886
## M 11906
## U 12
##
## , , = 2016
##
##
## 1 2 3 4 5 6 7 8 9 10 11
## F 5132 0 0 0 0 0 0 0 0 0 0
## M 10748 0 0 0 0 0 0 0 0 0 0
## U 5 0 0 0 0 0 0 0 0 0 0
##
## 12
## F 0
## M 0
## U 0
for both genders, May seems to always have the peak.
#### Vehicle type and Color
Some colors only apear in Automobile like Copper and chrome.. other colors are so common along all types like Black and White and Gray
## Violation Damage
In this part I will discuss what type of violations and how much damage
### Most common violation types
## Source: local data frame [15 x 3]
## Groups: Violation.Description [15]
##
## Violation.Description
## <fctr>
## 1 DRIVER FAILURE TO OBEY PROPERLY PLACED TRAFFIC CONTROL DEVICE INSTRUCTIONS
## 2 FAILURE TO DISPLAY REGISTRATION CARD UPON DEMAND BY POLICE OFFICER
## 3 DRIVING VEHICLE ON HIGHWAY WITH SUSPENDED REGISTRATION
## 4 FAILURE OF INDIVIDUAL DRIVING ON HIGHWAY TO DISPLAY LICENSE TO UNIFORMED PO
## 5 DRIVER FAILURE TO STOP AT STOP SIGN LINE
## 6 OPERATOR NOT RESTRAINED BY SEATBELT
## 7 DISPLAYING EXPIRED REGISTRATION PLATE ISSUED BY ANY STATE
## 8 PERSON DRIVING MOTOR VEHICLE ON HIGHWAY OR PUBLIC USE PROPERTY ON SUSPENDED
## 9 DRIVER USING HANDS TO USE HANDHELD TELEPHONE WHILEMOTOR VEHICLE IS IN MOTIO
## 10 EXCEEDING THE POSTED SPEED LIMIT OF 30 MPH
## 11 EXCEEDING THE POSTED SPEED LIMIT OF 40 MPH
## 12 DRIVING VEHICLE ON HIGHWAY WITHOUT CURRENT REGISTRATION PLATES AND VALIDATI
## 13 FAILURE OF VEH. ON HWY. TO DISPLAY LIGHTED LAMPS, ILLUMINATING DEVICE IN UN
## 14 EXCEEDING MAXIMUM SPEED: 39 MPH IN A POSTED 30 MPH ZONE
## 15 PERSON DRIVING MOTOR VEHICLE WHILE LICENSE SUSPENDED UNDER 17-106, 26-204,
## # ... with 2 more variables: Caused.Any.Damage <chr>, n <int>
I notice something here: Most violations are by males. but the days where males don’t make many violations. Female make many violations. We can and vice verse.. we can see it here in the spikes.. a male postive spike is often coupled with a female negative spike, but this issue should be looked at closer.
It seems that most Alcohol violations for both men and women happen between 2 PM and 5 PM
in the plot between the date of violation and Gender, I noticed usually when there is high number of violations of women there is low number of violations of men, and vice versa.. I did not expect that
This graph shows the count of violations in each minute. it shows when violations generaly happen during the day.
Here are some things to notive about this graph
*from this plot we can see number of violations increase over the years till 2014 it reached a peak. then started to come down at 2015
This plot shows the location of violations of a particular location..zoomed in… I choose it because it looks like the violations draws the map of the streets..
You can tell the major streets by just looking at the violations.. and it looks oddly like a blood veins..
this plot may not convey a lot of information,however I think this plot is very intersting and thats why I choose it.
The traffic violation data set contains information on more than 800,000 violation occured from 2012 till 2016. I this shows how much violations increase through the years.and what are the most times violations occure in, which I learned May and October see the most violations.
Also I used this data to get the most popular cars and models.
It seems this data can be used a lot to help reduce violation and understand its causes. like analysing the most locations that vilations occure and understand its causes.
Struggles I had with this dataset is that most of its variables are catagorical. and not continuous. This made it very hard to derive insights and make comparsions, I heavily relied on the “count” of violations variable. as I grouped by each categorgy. and I found very intersting insights (like in datetime and location)
one thing to make it better and could be future work is using this data with another labeled maps data. so we can see clearely where the violations occur…
also the description of the violation could be grouped into categories (ex: speeding, traffic light ignore, reckless driving) and studied further to help reduce violations and accidents, and make traffic better for everyone.